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Topological Centrality and Its Applications 
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Abstract — Recent development of network structure analysis shows that it plays an important role in characterizing complex system 
of many branches of sciences. Different from previous network centrality measures, this paper proposes the notion of topological 
centrality (TC) reflecting the topological positions of nodes and edges in general networks, and proposes an approach to calculating 
the topological centrality The proposed topological centrality is then used to discover communities and build the backbone network. 
Experiments and applications on research network show the significance of the proposed approach. 
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1 Introduction 

THE rich get richer phenomenon exists in many 
complex networks Hke the World Wide Web. It is 
known that there are two ways for a node to become 
richer: connecting to more nodes; and, connecting to 
more important nodes. 

We observe that a node may earn more if it connects to 
an important node than connects to many hut less important 
nodes, and that both nodes and edges play an important role 
in forming network centrality. 

Existing centrality measures focus on nodes. They 
cannot explain the topological characteristic of centrality. 
This paper is to explore a new network centrality called 
topological centrality. 

Various centrality measures are defined in a graph G = 
(y, E), where V is the vertex set, E is the edge set, 1^1 — 
n, and \E\ = m. 

The authority and hub reflect in-degree and out-degree 
characteristics of a node in the Web respectively [1]. 
The idea of HITS is that a good hub links to many 
authorities, while a good authority is linked by many 
good hubs. Nodes with the highest authority or hub in 
the Web graph act as authority centers and hub centers. 
The authority and hub of a node are calculated by: 

«W= E Hj) 

Hj)= E aW ' 

where a{x) and h{x) are the authority and hub of node 
X e respectively. 

Degree centrality describes the degree information of 
each node [2] [3]. It is based on the idea that more 
important nodes are more active, that is, they have more 
neighbors in the graph. Degree centrality can be used to 
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find the core nodes of a community; however, it only 
considers the hub characteristic and ignores the authority 
characteristic. Degree Centrality Cd{v) for a vertex v is 
calculated as follows: 



deg{v) 



Calculating degree centrality for all nodes ^ in a graph 
takes 0{n^) in a dense adjacency matrix representation 
of the graph. While in a sparse graph with edges E, the 
time complexity is 0(m). Similar to the degree centrality, 
an approach was proposed to improve the efficiency of 
information propagation in P2P network based on the 
in- and out-degrees of nodes [4]. 

Betweenness centrality describes the frequencies of 
nodes in the shortest paths between two indirectly con- 
nected nodes [2] [5] [6]. It is based on the idea that if 
more nodes are connected via a node, then the node is 
more important. Betweenness centrality can be used to 
find the edges between two communities in a complex 
network. Betweenness Centrality Cb{v) for vertex v is: 



Cb{v)= J2 



crst{v)/<yst 
{n-l)(n-2)'' 



where ast is the number of shortest geodesic paths from 
s to t, and ast{v) is the number of shortest geodesic 
paths from s to i that pass through a vertex v. The 
shortest paths between each pair of nodes in a graph 
can be foimd by Floyd-Warshall algorithm with time 
complexity 0{v?) [7], so the time complexity of between- 
ness centrality is also 0{v?). Betweenness centrality has 
been used to study community structure of social and 
biological networks [8]. 

Closeness centrality describes the efficiency of the infor- 
mation propagation from one node to the other nodes 
[2] [9] [10]. It is based on the idea that if a node can 
quickly reach others, then the node is central. Closeness 
centrality can be regarded as a measure of how long 
it will take information to spread from a given vertex 
to other reachable vertices in the network. Closeness 



Centrality is defined as the mean geodesic distance (i.e., 
the shortest path) between a vertex v and all other vertices 
reachable from v. 



Cc{v) 
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where n > 2 is the size of the network's connected 
component reachable from v. Calculating the closeness 
centrality for each node in the graph has time complexity 

0(n3). 

Eigenvector centrality describes the importance of nodes 
according to the adjacent matrix of a connected graph 
[11]. It assigns relative scores to all nodes in the network 
based on the principle that connections to high-scored 
nodes contribute more to the score of a node than 
connections to low-scored nodes. PageRank is a variant 
of the eigenvector centrality measure [12]. 

Information centrality describes nodes' influence on the 
network efficiency of information propagation [13]. The 
network efficiency is defined by 
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where the efficiency e^j in the corrununication between 

two points i and j is equal to the inverse of the shortest 
path length dij. The information centrality of a vertex i 
is defined as the relative drop in the network efficiency 
caused by the removal from G of the edges incident with 

^' A£ _ E\G\ - E\G'^\ 

E ~ E 

where G'^ indicates a network by removing the edges 
incident with node v from G. Information centrality has 
been used to study the structures of communities in 
complex networks [14]. 

2 Topological Centrality 

2.1 Definition 

In a dynamic network, the weights of nodes and the 
weights of edges will influence each other and keep 
changing. Each time of influence between each pair of 
nodes is called one time of iteration. If the order of 
nodes' weights keeps imchanging after many times of 
iteration, the network reaches the steady state and the 
nodes with the highest weights are called topological 
centers. An undirected graph may have one or more 
topological centers. The number of topological centers is 
decided by the graph structure. An undirected network 
may have one of the following structures. 

1. A network with circular structure has n (n > 3) 
topological centers as shown in Fig. la. 

2. A network with symmetric structure has two topo- 
logical centers as shown in Fig. lb. 

3. Otherwise, the network has a imique topological 
center as shown in Fig.lc. 





(a) Circular slructure 



(b) Symmetric structure 



(c) General structure 



Fig. 1 . Three types of topological structures. The darker is 
the node, the higher the topological centrality is. The black 
nodes are the topological centers. Networks of circular 
structure have n {n > 3) topological centers; network 
of symmetric structure has 2 topological centers; other 
networks have 1 topological center. 



In an undirected graph, the length of the shortest path 
between two nodes in a graph is the geodesic distance 
between them. Especially, if two nodes are unreachable, 
then their geodesic distance is +oo. Geodesic distance 
can be used to find the nearest topological center of a 
node. 

When a network is in the steady state, the topological 
centrality (TC) of a node is the ratio of its weight to the 
largest weight of nodes. The topological centers have the 
largest weight of node 1. The topological centrality of an edge 
is the ratio of its weight to the largest weight of node. 

The TC of a node reflects the geodesic distance from a 
node to its nearest topological center. The TC of an edge 
reflects the geodesic distance from the edge to its nearest 
topological center. The higher is the TC of a node/ edge, 
the closer it is to the nearest topological center. 

2.2 Calculating Topological Centrality 

Hypothesis 1. The topological centrality of a node is pos- 
itively influenced by the topological centrality degrees of its 
neighbor nodes. 

H5^othesis 1 leads to the following characteristics: 

1. a node connecting to nodes with higher TC degrees 
gets higher TC degree; and, 

2. a node connecting to more nodes gets higher TC 
degree. 

Hypothesis 2. If two nodes of an edge have higher TC 
degrees, then the edge has higher TC; and, if an edge has 
higher TC, then its two nodes also have higher TC degrees. 

Hypothesis 2 leads to the following characteristics: 

1. nodes closer to the topological center have higher 
TC degrees; and, 

2. edges closer to the topological center have higher 
TC degrees. These characteristics reflect that nodes 
with higher TC degrees are incident with edges 
having higher TC degrees. 

The two hypotheses can be represented by: 



w(n) ^= uj{n) + Y,9{^{l'i'"'k{n,ni)) T,w(nj) t) 



(1) 
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where n is a node, Ui are neighbors of n, ui{link(^ri. rij)) 
is the weight of link between n and n^; / is a link. Is and 
It are the source and target nodes of I respectively; / and 
g are two functions, and t means the positive correlative 
relations. 

During the calculation process of TC degree, the 
weights of nodes and edges will increase after each 
time of iteration, but the descending order of weights of 
nodes will converge to the steady state. The weights of 
nodes can be normalized by dividing the largest weight 
of nodes. If the normalized weights of nodes converge, 
the descending order of nodes' weights will keep un- 
changing, and the edges' weights will also converge. The 
converged nodes' weights and edges' weights are the TC 
degrees of nodes and links respectively. 

Normalization of weights of nodes satisfies the follow- 
ing characteristics: 

1. If the normalized weights of nodes converge, then 
the order of nodes by descending the weights 
of nodes will also converge. The normalization 
process does not change the order of weights of 
nodes. The difference is that the weights of nodes 
are mapped onto the interval (0, 1]. 

2. If the normalized weights of nodes converge, the 
weights of edges also converge. According to the 
definition of TC of an edge, the weights of edges 
are the sum of the weights of its two end nodes. 
Since the normalized weights of nodes converge, 
the weights of incident edges will also converge. 

3. If the normalized weights of nodes converge, then 
the TC degrees of edges converge. It is also obvi- 
ous, because the normalization of weights of edges 
is just to map the weights of edges onto the interval 
(0, 1], and keeps the order of weights of edges. 

We propose the following approach to calculating the 

TC in a connected network. Suppose a connected graph 
G = {V,E) with n {n > 1) nodes and m {m > n — 
1) edges, V = vi,V2, ■ ■ ■ ,Vn, E = ei, 62, . . . , e^, and the 
corresponding adjacency matrix is A. The element of A 
is aij, and. 



1 {i,j](.E 
{i,j}^E 



The following formula implements the iterative cal- 
culation of topological centrality of nodes and edges, 
where temp_uji and Wi are the weights of Vi before and 
after normalization, and temp_u>e{i,j) and i^e(i,i) are the 
weights of edge e{i,j) before and after normalization, 
and i > is the iteration time. 



tempjjj, 
tempjii 



(t+i) 
(t+i) 



it) 



temp_io\ 



(t+i) 
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- temp ojj 



(2) 



nodes and links. 
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(t+i) 
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The iterative calculation terminates, if the following 
conditions are satisfied: 



,(*+!) ,,(*)\2 



Z^j=l K^ej — We,- ) < Cm 



(4) 



Algorithm 1 calculates the weights of nodes and links 
iteratively, where MAX, cn and cm control the times of 
iterative calculation. 

The time complexity of Algorithm 1 is 0(MAX(n + 
m)). At the initializing stage, all the weights of nodes 
are assigned to 1. If the weights of edges are not given, 
then all the weights of edges are assigned 1. After the 
first iteration, the weight of a node in next iteration is the 
sum of weights of its neighbor nodes and its own weight; 
then the weights of edges are the sum of two end nodes. 
The values of weights of nodes become larger comparing 
to the initial values. The weights of nodes and edges are 
normalized by dividing the maximum weight of nodes 
and edges during each time of iteration. 

Algorithm 1 has two termination conditions: one is the 
maximum iteration times MAX; the other is the square 
deviation threshold of weight difference of nodes eN 
and the square deviation threshold of weight difference 
of edges cm- After Algorithm 1 stops, the nodes with 
weights 1 are the topological centers. The weight of a 
node is topology centrality, and the larger is the weight 
of node, the closer the node is to the nearest topological 
center. 

Table 1 makes a comparison between the topological 
centrality and other centrality measures. 

TABLE 1 

Comparison of different centrality measures 



Centrality Measure 


Time Complexity 


About Node or Edge 


degree centrality 




node 


betweenness centrality 




node or edge 


closeness centrality 


O(n^) 


node 


eigenvector centrality 




node 


information centrality 




node 


topological centrality 


o[K{n + m)) 


node and edge 



The following formulas normalize the TC degrees of 



2.3 Experiments 

2.3. 1 Convergence Experiment 

We carry out experiments on several types of network to 
verify the convergence of the algorithm. Fig. 2 shows the 
experiment results of iterative TC calculation for nodes 
and links in different structured networks with different 
scales: (a) Watts-Strogatz small-world network with n = 
1000 and m = 5000; (b) ring network with n = 1000 
and m = 1000; (c) lattice network with n = 100 and 
m = 180; (d) full network with n = 30 and m = 435; 
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Algorithm 1 Calculating topological centrality degrees of nodes and edges 



Require: node number n, edge number rn, edges like [linknum, starNode, endNode, weight), limited iteration time 
MAX, deviation square limit of weight difference of nodes ejv, deviation square limit of weight difference of 
links eM', 

nodeWeight[l..n] <— 1, count ^ 0, nodeSum ^ n, edgeSum ^ m 
while (count < MAX) and (( nodeSum > ejy ) or (edgeSum > em)) do 
oldNodeWeight[l..n] ^ nodeWeight[l..n] 



oldEdgeWeight[\..m\ ^ edgeWeight\l..m\ 

J TTT - J ir-i 1 ^odeVKei(;/ii[l..n] + V. . . . . edgeW eight^nodeW eight 

nodeWeiqtit\l..n\ < -, — , ° . , 

^ L J maxi^nodeW eigat) 

edgeWeight ^ ^"^iXewlt'^tr''' 
nodeSum <— Y^^^^{nodeW eight[i] — oldNodeWeight[i])^ 
edgeSum ^ jy^^{edgeW eight[i] — oldEdgeW eight[i])^ 
count -i— count + 1 
end while 

return nodeWeight[l..n] and edgeWeight[l..m] 



(d) Edors-Renyi random graph with n = 1000, p = 0.02, 
and m = 10045. Experiment results show that the TC 
degrees of node and links can converge after many times 
of iteration, which is related to n, m, eN and eM- 

2.3.2 Comparison of Centrality Measures 

Different centrality measures such as degree centrality, 
betweenness centrality, closeness centrality and infor- 
mation centrality are compared in [15]. Here we add 
two extra centrality measures: one is the PageRank of 
node as an instance of eigenvector centrality, the other is 
the topological centrality we proposed. The comparison 
is based on Fig. 3 which is a tree with 16 vertices. 
Table 2 shows different centrality degrees of vertices 
in Fig. 3. The experiment results show the following 
characteristics: 

1. Degree centrality is a local centrality, and it only 
records the degrees of nodes without any global 
information. Nodes 1, 2, and 3 have degree 5, nodes 
7 and 12 have the degree 2, and the other nodes 
have degree 1. Degree centrality is normalized by 
the number of edges 15. 

2. Closeness centrality has similar result as informa- 
tion centrality. The difference is that the orders of 
nodes {1, 3} and {7, 12} are different. Information 
centrality degrees of vertex 1 and 3 are larger than 7 
and 12. Because information centrality concentrates 
on the network efficiency. The influence on network 
efficiency by removing 1 and 3 is larger than that 
by removing 7 and 12. 

3. PageRank result is far from other measures. Nodes 
1 and 3 are two centers in PageRank, and node 2 
have lower PageRank than nodes 1 and 3, because 
the authority of nodes 7 and 12 are divided into 
two parts, while nodes 1 and 3 have four neighbors 
which contributes all of their authority values to 
nodes 1 and 3 respectively. Nodes 7 and 12 have 
higher rank values than nodes 9, 10 and 11, because 
they have more neighbors. 



4. Betweenness centrality reflects the frequencies of 
nodes occurring in the shortest paths between indi- 
rectly connected node pairs. However, betweenness 
centrality has the worst resolution of nodes. Node 
2 has the highest betweenness centrality, nodes 1, 3, 
7, and 12 have higher betweenness centrality, and 
the others have the same betweenness centrality 0. 

5. Topological centrality combines the degree infor- 
mation and neighbor weights information. It has 
the characteristics of degree centrality and PageR- 
ank. Node 2 is the topological center of the graph. 
Nodes 7 and 12 have higher TC degrees than nodes 
9, 10 and 11 because they have extra neighbors. 
Nodes 1 and 3 follow nodes 9, 10 and 11, and 
then the left vertices. The order of node TC degrees 
confirms the geodesic distance between nodes and 
the topological centers correctly. 




© 

Fig. 3. A simple case (a tree with 16 nodes) for the 
comparison of centrality measures. 



2.3.3 Topological Centrality Distributions on Research 
Network 

Here DBLP dataset is used to study the structure and 
discover communities in heterogeneous networks. It con- 
tains part of metadata of papers provided by DBLP in 
XML formats. The number of papers is 664, 188, and the 
number of citation relations is 79, 128. The heterogeneous 
research network is based on the DBLP data set. The 
resource types are papers, researchers and conferences. 
The semantic links are authorOf between researcher and 
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Fig. 2. Topological centrality convergence experiments {MAX = 100, = cm = 0.001): the left column lists networks 
of several structures; the middle column lists the node convergence records (x-axis is iteration times, and y-axis is 
normalized weights of nodes); and, the right column lists the link convergence records (x-axis is iteration times, and y- 
axis is normalized weights of links), (a) Watts-Strogatz small-world network with n = 1000 and m = 5000, and iteration 
time is 14; (b) ring network with n = 1000 and m = 1000, and iteration time is 2; (c) lattice network with n = 100 and 
m = 180, and iteration time is 17; (d) full network with n = 30 and m = 435, and iteration time is 2; (e) Edors-Renyi 
random graph with n = 1000, p = 0.02, m = 10045, and iteration time is 17. 
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TABLE 2 

Comparison between topological centrality and other 
centrality measures 



V 


C/(«) 


Cd{v) 


Cc(f) 


Cs(i>) 


PR{v) 


log(CT{v)) 


2 


0.591 


0.333 


0.455 


0.714 


0.153 


0.0 


7 


0.389 


0.133 


0.405 


0.476 


0.063 


-0.755 


12 


0.389 


0.133 


0.405 


0.476 


0.063 


-0.755 


9 


0.116 


0.067 


0.319 


0.000 


0.035 


-0.827 


10 


0.116 


0.067 


0.319 


0.000 


0.035 


-0.827 


11 


0.116 


0.067 


0.319 


0.000 


0.035 


-0.827 


1 


0.444 


0.333 


0.349 


0.476 


0.161 


-2.454 


3 


0.444 


0.333 


0.349 


0.476 


0.161 


-2.454 


4 


0.106 


0.067 


0.263 


0.000 


0.037 


-5.718 


5 


0.106 


0.067 


0.263 


0.000 


0.037 


-5.718 


6 


0.106 


0.067 


0.263 


0.000 


0.037 


-5.718 


8 


0.106 


0.067 


0.263 


0.000 


0.037 


-5.718 


13 


0.106 


0.067 


0.263 


0.000 


0.037 


-5.718 


14 


0.106 


0.067 


0.263 


0.000 


0.037 


-5.718 


15 


0.106 


0.067 


0.263 


0.000 


0.037 


-5.718 


16 


0.106 


0.067 


0.263 


0.000 


0.037 


-5.718 



paper, coauthor between researchers, publishedln between 
paper and conference/journal, and cite between papers. 

The research network contains 1, 084, 198 semantic 
nodes and 2, 153, 385 semantic links. The iteration time 
limits are MAX = 40 and en = = 200. The 
distribution of TC degrees of nodes is shown in Fig. 4. 
It shows that nodes with lower TC degree contain more 
resources than those with higher TC degree. 



250000 




Topological centrality (IcgO) 



Fig. 4. Topological centrality distributions. 



3 Application: Discovering Research 
Communities 

3.1 Research Community 

Research communities are formed by relations among 
researchers, papers, projects, and research activities. Dif- 
ferences between research commimities and graph-based 
communities are as follows. 

1. Research communities are dynamically formed by 
research activities such as applying (e.g., funding 
and position), cooperating, publishing, and citing. 
Communities in general complex networks are 



viewed from connections (nodes within a com- 
munity are linked more densely than nodes cross 
communities). 
2. Research communities contain multiple types of 
nodes (researchers and papers can play different 
roles in research activities as discussed in [15]) 
and relations (e.g., coauthor relation and citation 
relation). There are no differences of nodes and 
edges in graph-based communities. 

Among existing centrality measures, only the PageR- 
ank considers the influences between neighbor nodes, 
and the authority of a node is divided by its neighbors. 
However, PageRank does not reflect different influences 
of edges, that is, all the weights of edges are 1. In re- 
search network: collaborations between authority researchers 
are more important, and citations between authority papers are 
more important. 

Topological centrality can well distinguish roles of 
different nodes in research network. (1) Nodes in a 
network elect the core nodes by a voting-like mechanism: 
a node connecting to more nodes is more probable to be the 
local core nodes. After a certain times of iterations, the 
local core nodes and the global topological centers are 
elected. The topological centers are the nodes connecting 
to the most core nodes with higher TC degrees. (2) Edges 
may play different roles on the mutual influence between 
the TC degrees of nodes. This confirms the phenomena 
of research communities: a researcher cooperating with 
authority researchers will be closer to the centers of a 
research community; a paper citing (citing may not be 
true) or is cited by authority papers will be more possible 
to be closer to the core papers on a research topic. 

3.2 Roles of Nodes 

Nodes can play different roles according to topological 
positions in communities: core node, margin node, bridge 
node and mediated node. 

1. Core nodes are usually hub or authority in the 
community; 

2. Margin nodes belong to one commimity, and they 
have few connections to other nodes in the com- 
munity; 

3. Bridge nodes connect to two or more communities, 
and they usually have equal number of connections 
to two or more communities; and, 

4. Other nodes except the core nodes, margin nodes and 
bridge nodes are mediated nodes. 

The proposed topological centrality can be used to dis- 
tinguish roles of nodes. For example. Fig. 5 contains three 
communities: Ci = {1,4,5,6,7,8}, C2 = {2,7,9,11,12} 
and C3 = {3, 12, 13, 14, 15, 16}. Node 1, 2 and 3 are the 
core nodes of Ci, C2 and C3 respectively; Nodes 7 and 
12 are bridge nodes; nodes 4, 5, 6 and 8 are margin nodes 
of Ci, nodes 9, 10 and 11 are margin nodes of C2', and, 
nodes 13, 14, 15 and 16 are margin nodes of C3. 

Nodes can be classified by TC degrees. 
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Core Node 




Fig. 5. Distinguishing roles of nodes with topological 
centrality degrees. 



1. If the TC degree of a node is larger than that of 
most of its neighbors, then the node is a core node; 

2. If the TC degree of a node is no larger than the TC 
degrees of all of its neighbors, then the node is a 
margin node; 

3. If the number of neighbors with lower TC degrees 
equals to the number of neighbors with higher TC 
degrees, then the node is a bridge node; 

4. Otherwise, the node is a mediated node. 

Let a = #Lin)/#N{n) and /3 = #H{n)/#N{n), 
where n is a node, ^L{n) is the number of neighbor 
nodes of n with TC degrees lower than n, ^H{n) is the 
number of neighbor nodes of n with TC degrees higher 
than n, and N{n) is the neighbors of n, then role of n is 
distinguished by 



role{n) 



core node 
margin node 
bridge node 
mediated node 



a > threshold{core) 
a = 
a = (3 
otherwise 



Where threshold{core) G (0.5, 1] controls the number of 
core nodes. 

A node is a core node because it connects to more 
nodes or more important nodes. A node is core node 
or not is decided by whether it has larger TC degrees 
than its neighbors. However, the topological centers of 
a connected network may be exceptions. In Fig. 5, node 
2 is both the topological center and a core node, but 
the ellipse node in Fig. 6 is the topological center, and 
it is not a core node but a bridge node, although it 
has higher TC degree than all of its neighbors. So it is 
significant to distinguish the roles of topological centers. 
If the neighbors of a topological center are all core nodes, 
then, the topological center is a bridge node, else the 
topological center is a core node. 

Researchers and papers may play such roles as source, 
authority, bee, hub and novice [15]. The source, authority, 
and hub may be core nodes; bee nodes are often bridge 
nodes; and the novice may be the margin nodes or bridge 
nodes. 

In research network, a research group's leader usually 
has more publications and cooperators. Correspond- 
ingly, they have more coauthor relations connecting to 




Fig. 6. The ellipse node is a topological center, and it is 
not a core node but a bridge node. 



other researchers in the coauthor network. If each re- 
search group is regarded as a commimity the research 
group's leaders are the core nodes. The fresh students 
have few publications and cooperators, so they are the 
margin nodes in coauthor network. Visiting researchers 
and newly employed researchers are bridge nodes, be- 
cause they have cooperators in different research com- 
munities. After the core nodes, the margin nodes and 
bridge nodes are distinguished, the left nodes are medi- 
ated nodes. Usually, mediated nodes only belong to one 
community. 

In citation network, core nodes are the authority or 
hub papers having more citations than others; the mar- 
gin nodes are the novice papers or newly published 
papers; and the bridge nodes connect two or more paper 
clusters. Each paper cluster may belong to a specific 
research topic or discipline. 

Funding decision-making and research promotion 
need to evaluate researchers and their papers. Topologi- 
cal centrality can help distinguish the roles of researchers 
and papers, and the roles can be used to evaluate 
researchers and papers. TC degrees in the coauthor 
network help evaluate researchers, while TC degrees in 
citation network help evaluate papers. 

In research network, roles of nodes will change year by 
year. In the coauthor network, a novice researcher may 
become an authority, a hub or even a bridge. With more 
papers published, the TC degree of a node in a coauthor 
network will become higher than its neighbors, and then 
the researcher become an authority or hub. Cooperating 
with researchers in different research groups or even 
different communities, a researcher becomes a bridge. 

3.3 Discovering Communities by Roles 

Tree in Fig. 3 can be a coauthor network or a citation 
network with directions of edges ignored. General com- 
munity discovery algorithms like GN algorithm cannot 
discover its communities, because the betweenness of 
each edge is the same, and there is no way to choose 
the proper edge for deletion. However, nodes in the 
coauthor networks and citation networks play different 
roles, and communities can be discovered according to 
the roles of nodes. 

The roles of nodes can be used to discover commu- 
nities. One way is to find the core nodes, and then 
assign non-core nodes to the proper core nodes to form 
communities. Algorithm 2 discovers communities by 
finding core nodes for each non-core node. 
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Algorithm 2 Finding k communities by core nodes 
Require: a network C; 
1: Calculate the topological centrality degrees of nodes 
and links; 

2: Distinguish roles of nodes and add the core nodes 

into CoreSet; 
3: for node x e CoreSet do 
4: nodes{x) <— x 
5: end for 

6: for each non-core node x do 

7: Choose the nearest core nodes into CandidateSet 
as the candidate nodes; 

8: for node y € CandidateSet do 

9: nodes{y) <— nodes{y) U x; 
10: end for 
11: end for 

12: while \CoreSet\ > k do 

13: Merge two most tightly connected commimities; 

14: end while 

15: return k communities. 



The time complexity of algorithm 2 is 0{n{n+m)). The 
number of core nodes can be controlled by setting the 
threshold of il=L{n)/#A{n). If there are more than one 
candidate core nodes, then the node should be classified 
into different commimities, and the bridge nodes are 
often classified into several communities at the same 
time. 

This way can discover communities globally in a 
network. If the number of communities is too many, the 
closely connected communities can be merged into larger 
communities. Closely coimected communities may share 
many nodes and links, or there are many external con- 
nections between them. Suppose the number of commu- 
nities is k, Algorithm 3 merges communities. 

Another way is to find the core nodes first, and 
then expand from a node to form local communities. 
According to role of nodes, the commurvity expansion 
needs to consider the following cases. 

1. Forming local community according to core node. 
Algorithm 4 is for discovering local communities 
from a core node. A commimity may have more 
than one core node. If two commimities share many 
common nodes and links, then the two communi- 
ties can be merged into a larger community. This 
way can find the research groups in a coauthor 
network, and can find the specific topic related 
paper clusters in the citation network. 

2. Form local community according to non-core node. 
To find local communities from a non-core node, 
it is necessary to find the core nodes connected 
to the node. Before finding communities of a non- 
core node, all the core nodes in the network should 
be found first. Then expand the local communities 
from the nearest core nodes connected to the non- 
core node respectively. 



Algorithm 3 Merging communities 
Require: the number of communities k; 
1: Step 1. If the number of communities is less than k, 

then goto Step 4. 
2: Step 2. Calculate the Jaccard similarity of node sets 
of each commurvity pair. Suppose A and B are two 
communities, Jaccard similarity of A and B is calcu- 
lated by 

Jaccard{A,B) = |^[^|| - 

If all the Jaccard similarities of community pairs 
equal to 0, then goto Step 3; else, find the community 
pairs have the largest Jaccard similarity, and merge 
them into a larger community respectively. Goto Step 
1. 

3: Step 3. Count the external links between community 
pairs. An external link has two end nodes in two 
different communities respectively. If all the numbers 
of external link set equal to 0, then goto Step 4; else, 
find the community pairs have the maximum exter- 
nal links, and merge them into a larger community 
respectively. Goto Step 1. 

4: Step 4. Stop merging commimities. 



3. Finding local community of a set of nodes. Given 
a set of nodes, the local community can be found 
as follows. 

a) For each node, find the core nodes connected 
to it until the topological center is found; all 
the core nodes are added to coreSet. 

b) Building the subgraph containing these nodes 
and nodes in coreSet; and, 

c) Expanding the local community from the 
nodes in coreSet. 



Algorithm 4 Expanding community from a core node 

Require: A core node c and a connected network G; 
1: nodeQueue <— {c}, nodeSet <— {c}, linkSet <— {}; 
2: while nodeQueue 7^ {} do 
3: Fetch a node x from nodeQueue; 
4: for y is the neighbor node of x do 
5: Distinguish the role of y; 

6: if (y ^ nodeSet) and {y is not a core node) and 
{nodeW eight{y) < nodeW eight{x)) then 

7: nodeQueue ^ nodeQueue U y; 

8: nodeSet <— nodeSet U y; 

9: linkSet <— linkSet U link{x, y); 

10: end if 
11: end for 
12: end while 
13: return linkSet. 



Fig. 7 shows a segment of network with TC degrees of 
nodes. We can find a local community from a core node, 
a non-core node, and a set of nodes as follows. 
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# Topological center Q Cora node □ Non-co»node 

Fig. 7. A simple case for finding community: circle nodes 
are core nodes; square nodes are non-core nodes. 

1. Finding local community of core node B. The 
process is shown as Table 3. 



TABLE 3 

Finding local community of core node B 



Step 


Node 


nodeQueue 


nodeSet 


Expanded 





B 


B 


B 


C, D, E 


1 


C 


D, E 


B, C 




2 


D 


E 


B, C, D 


F, G, H 


3 


E 


F, G, H 


B, C, D, E 


I, J 


4 


F 


G, H, I, J 


B, C, D, E, E 




5 


G 


a, I,] 


B, C, D, E, F, G 




6 


H 


11 


B, C, D, E, E, G, H 




7 


I 


J 


B, C, D, E, F, G, H, I 




8 


J 




B, C, D, E, F, G, H, I, J 





2. Finding local community of non-core node F is 
to find the nearest core node D, then find the 
local community from D. The expansion process 
is shown in Table 4. 



TABLE 4 

Finding local community of non-core node F 



step 


Node 


nodeQueue 


nodeSet 


Expanded 





D 


D 


D, F 


G, H 


1 


G 


H 


D, F, G 




2 


H 




D, F, G, H 





3. Finding local community of a node set {D,I,J}. 
D is a core node, while / and J are two non- 
core nodes. If D is the core node of the com- 
mimity containing I and J, then {D,I,J} forms 
the local community. However, D is not the core 
node of the commimity containing / and J. The 
possible core nodes of the community containing 
D are {D,B,A}; the possible core nodes of the 
community containing / and J are the same, that 
is, {E,B,A}. Then, we can construct the subgraph 
containing node D, I and J and their possible core 
nodes D, E, B and A as shown in Fig. 8. 
From the subgraph, we know that B the nearest 
core node of the community containing D, I and 




0.2 



Fig. 8. Subgraph containing node D, I, J and the possible 
core nodes D, E, B and A. 

J. Then, we can expand from B to find the local 
community containing node D, I and J as men- 
tioned in case (1). 
In research network, this way can find research team 
members of a researcher in a coauthor network and find 
topic-related papers of a paper in a citation network. 

Given a set of papers, the coauthor relations form the 
coauthor network, and the citation relations form the 
citation network. After the TC degrees are calculated, the 
research groups can be discovered, and the papers can be 
clustered by citation relations. Researchers in the same 
communities may share the similar research interests, 
while papers in the same clusters are topic related. 
Topic-related papers can be recommended to researchers 
having similar research interests. Global communities 
show research groups and research topics in the paper 
set, while the local community expansion way help 
recommend papers in a large paper set to appropriate 
readers. 

When making a funding decision, it is necessary to 
evaluate the status of a research group, cooperators, and 
publications. The discovered communities in coauthor 
network show the research groups of a research area, 
while the discovered communities in citation network 
show paper clusters in the research area. And, the roles 
of the researcher and his/her publications can be distin- 
guished by TC degrees. 

4 Application: Discovering Backbone in 
Research Network 

Given a set of research papers, research networks 
such as coauthor networks and citation networks can 
be constructed. Metadata of papers in computer 
science are often stored in Bibtex or XML files 
provided by online digital libraries such as Google 
Scholar, ACM Portal (http://portal.acm.org), IEEE 
digital library (http://ieeexplore.ieee.org), DBLP 
(http://www.informatik.uru-trier.de/~ley/db/) and 
Citeseer (http://citeseer.ist.psu.edu) etc. 

4.1 Structures of Research Network 

Researchers and the coauthor relation form the coauthor 
network. Coauthors of a paper formulate the motif [16] 
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of research network. A coauthor relation from A to B 
means that A and B are coauthors of the same paper, 
and A is before B in the author list. 

Fig. 9 shows the structure of the coauthor network. 
With the directions of coauthor relations ignored, each 
motif describes the cooperation between authors of a 
paper: a loop for the sole author, an edge between two 
authors, a triangle for three authors, and a complete 
graph for n{n > 3) authors. Coauthor network has three 
layers from local view to the global view: motif layer, 
module layer and global layer. Nodes' degrees in coauthor 
network reflect the active degrees of researchers. The in- 
links reflect the hub characteristic, while out-links reflect 
authority. 







Global layer 










. Modulel 


Module2 


Module layer 
More Modules 


1 author 2 authors 3 author 


s 4 authors 


Motif layer 
more authors 



Fig. 9. Structure of coauthor network from local view to 
the global view: (1) the bottom layer contains the motifs; 
(2) the middle layer contains the modules combing one 
or more motifs; (3) the top layer contains the networks of 
modules. 

Our first dataset collects papers of the International 
Semantic Web Conference (ISWC) from 2002 to 2007. 
The number of researchers and papers are 935 and 401 
respectively. The number of coauthor relations is 2286. 
The number of citation relationship is 236, and citation 
relations are considered between the paper pairs both 
in ISWC. The number of authorOf relations is 1362. Fig. 
12 shows the node TC degrees of the largest module 
of coauthor networks with a circular layout. The central 
nodes have higher TC degrees, and the topological nodes 
have the highest centrality 1. From a topological center 
to the margins, the TC degrees reduce to step by step. 
If the number of nodes are very huge, the TC degrees 
are very small, and function log{) maps the TC from 
interval (0, 1] to (—21, 0], and the order of node TC keeps 
unchanging. 

Fig. 10 shows the modules in coauthor network of 
ISWC dataset. It contains 147 modules, 935 researchers 
and 2286 coauthor relations. Fig. 11 shows the largest 
module of Fig. 10. It contains 370 researchers and 1227 
coauthor relations. 

The number of coauthor relations between two re- 
searchers reflects the frequency of their cooperation. 
Node degrees in coauthor network reflect the active 
degrees of researchers. The in-links reflect the hub char- 




Fig. 10. Coauthor networks of ISWC data set: 147 mod- 
ules, 935 researchers and 2286 coauthor relations. 




Fig. 11. The largest module of coauthor networks of 
ISWC data set. 

acteristic, while out-lrnks reflect authority. 

The density of a module is reflected by the frequency 
of cooperation between researchers. The average coop- 
eration active degree between each pair of researchers, 
called cooperation density, can be used to assess the 
active degree of a research commimity. Cooperation 
density is the number of coauthor relations dividing the 
number of researchers. 

Theorem 1. A module M of coauthor network has 
n researchers, the lower bound and upper bound of 
module density are within the range [{n — l)/n, n — 1]. 

Proof. Suppose M is a cormected digraph with n nodes. 
The lower bound of density: the number of edges is 
n — 1 at least, otherwise there will be some isolated 
researchers. So the lower bound density of M is {n—l)/n. 
The upper bound of density: if there are at most one 
directed edge between two nodes, then the number of 
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Fig. 12. The largest module of coauthor networks of 
ISWC data set with weights: the topological centrality 
degrees are transformed by function log{). 

edges in AI is n{n — 1) at most. So the upper bound 
density of M is n — 1. Therefore, the lower bound and 
upper bound of module density of module M with n 
nodes are within the range [(n — — 1]. □ 

Citation network is a directed acyclic graph (DAG). 
Each paper has the fixed publishing time, and papers 
can only cite the papers already published, so there are 
no cycles in the citation network. Citation is direction 
sensitive, and it implies the time sequential relationship 
between two papers. Fig. 13a shows a module of citation 
network. Papers in the same module are topic related. 

Citation relations show the relevance between research 
papers, and paper communities can be discovered by 
citation relations. Citations in the community show the 
relevance between papers, while citations between paper 
communities show the relevance of research topics. 

Fig. 13b shows the modules in citation network of 
ISWC dataset. It contains 36 modules, and the largest 
module contains 142 papers and 165 citation relations. 
All the citation relations are between papers published in 
ISWC. The connectivity density is less than the connec- 
tivity density of coauthor network. The citation density 
of a module reflects the relevance between the papers. 
The citation density is the number of citations dividing 
the number of papers. 

4.2 Topological Centrality based Backbone Network 

In a network, after roles of nodes are distinguished by 
the node TC degrees, core nodes and edges among them 
form a subgraph, called backbone network. The end nodes 
of edges in the backbone network are both core nodes. 

The backbone network consists of core nodes. It is 
useful for visualization and browsing, and can play the 
following roles in scientific research: 

1. It helps display the research network of different 
levels. Each community can be represented by the 



Papers in other reseearch topic Publshing Time 

A 




Fig. 13. Structure and instance of citation network: (a) 
The structure of citation network, (b) The citation network 
of ISWC data set. 



core nodes in the backbone network. When a core 
node is focused, the detailed information of its local 
community can be browsed. 

2. It shows the important researchers in a coauthor 
network. When a research community or research 
group is mentioned, the leaders of the community 
or the head of the research group are well known. 
Fig. 14 shows the backbone network of the largest 
module of the coauthor networks of ISWC data 
set. The threshold of the core nodes is 0.5, and 
the threshold of the margin nodes is 0. It contains 
all of the core nodes and the coauthor relations 
among them. Most of the core nodes are connected, 
and this verifies the "rich club" phenomenon [17]: 
richer nodes are more possibly connected with 
other richer nodes. Some core nodes formulate the 
connected components alone, because the bridge 
nodes between them are non-core nodes. 

3. Backbone network of coauthor network can be 
used to propagate information. Coauthor network 
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is a kind of social network. Core nodes are impor- 
tant during the information propagation because 
they have more impact in their communities. Sup- 
pose an invitation of PC members needs to be sent, 
the researchers in the backbone network should 
take the priority. 

4. Papers formulate communities via the citation re- 
lations, and papers in a community share the same 
or relevant research topics. Core nodes are often 
important papers citing or are cited by more im- 
portant papers. The backbone network of citation 
network helps find the development and history 
of a research area or a research topic. Core nodes 
and its neighbors reflect the main achievements at 
different research stages. 

5. Paper publication venue network contains confer- 
ences and journals. Other research resources such 
as researchers, papers and publishers connect con- 
ferences and journals into a connected network. To 
find the citations among conferences and journals, 
the sub-network containing conferences, journals 
and papers can be built. If a super node represents 
the conference or journal containing papers, then 
citation relations in the super nodes and between 
different super nodes can be counted. The number 
of external citations reflects the relevance of con- 
ferences and journals. 




Fig. 14. Backbone network of the largest module of 
coauthor network of ISWC Dataset from 2002 to 2007. 

Similarly, the relevance of publishers' businesses, 
projects, and institutions can be analyzed. The rele- 
vance of publishers is reflected by the relevance between 
books and papers published by them. The relevance 
among projects is reflected by the cooperation between 
researchers taking part in the projects and citations be- 
tween papers supported by the projects. The relevance 
between institutions can also be reflected by the rele- 
vance between researchers and papers. 

4.3 Evolution of Backbone Networks 

Backbone networks can be used to study the develop- 
ment of scientific research. Backbone networks sorted by 



years reflect the evolvement of research networks. Sim- 
ilarly, the evolvement of backbone networks in citation 
network, paper venue network, and institution networks 
etc can be studied. 

Fig. 15 shows the evolution of coauthor network of 
ISWC from 2002 to 2008. The coauthor networks are 
accumulated year by year, that is, the coauthor network 
of year n (2002 < n < 2008) contains the coauthor 
relations from year 2002 to year n. 

The evolvement of coauthor network reflects the his- 
tory of ISWC. More and more researchers have taken 
part in the conference, while the nodes and links in 
backbone networks are also changing. The following 
characteristics in the evolvement of coauthor networks 
can be discovered: 

1. New researchers in the coauthor network often 
cooperate with the researchers that have published 
papers in ISWC conference, because the scales of 
modules in coauthor networks become larger year 
by year. 

2. Scientific researchers are tending to cooperate with 
others. The evolution graph shows that the isolated 
nodes enter the connected components step by 
step. 

3. Core researchers are tending to cooperate with 
each other. The number of researchers in the 
largest modules of backbone networks becoming 
larger and larger. This reflects the "rich club" phe- 
nomenon [17] in scientific research. 

4. Core researchers are active locally, and they have 
more cooperators than their neighbors. The roles 
of researchers in coauthor network are also chang- 
ing: new researchers may become core researchers, 
while core researchers may become middle nodes 
or margin nodes. 

5. The topological centers of the largest module are 
changing. The topological centers emerge through 
a voting-like mechanism. Table 5 shows the topo- 
logical centers. 



TABLE 5 

Topological centers of coauthor networks of ISWC from 
2002 to 2008 



Year 


#Researcher 


#Cooperation 


Topological Center 


2002 


99 


174 


Katia P. Sycara 


2003 


262 


510 


Katia P. Sycara 


2004 


393 


877 


Steffen Staab 


2005 


570 


1310 


Steffen Staab 


2006 


753 


1872 


Guus Schreiber 


2007 


897 


2290 


Guus Schreiber 


2008 


1024 


2647 


Guus Schreiber 



5 Discussions 

5.1 About the Topological Centrallty 

The TC degree of a node reflects the geodesic distance 
to the nearest topological center in the network. The 
value of TC degree has no definite explanation, but 
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Fig. 15. Evolvement of coauthor network of ISWC from 
2002 to 2008: each row shows the coauthor network and 
its backbone network; the left column shows the coauthor 
network, while the right column shows the backbone 
network. 



it is different from the ranking results of PageRank. 
TC degrees have close relation with the authority of 
nodes. Authoritative nodes have higher TC degrees 
than its neighbors. The authority of a node reflects the 
importance of a node in information propagation. The 
TC degrees are explainable in communities. Core nodes 
have higher TC degrees than their neighbors. Isolated 
resources have less influence in the global society. 

Backbone networks can help study relations between 
resources of different types. Backbone network of hetero- 
geneous research networks connects important resources 
in a research topic and important resources may be 
researchers, papers, conferences, journals, institutions 
and publishers etc. This helps find and recommend 
information. Furthermore, related information can be 
displayed by an interactive visualization based browser. 

In general complex networks, edges have no seman- 
tics. While in semantics-rich networks, edges have se- 
mantic relations. Weights of nodes are affected by their 
neighbors, and different relations have different effects. 
So it is necessary to consider the influences of relations 
on the topological centrality calculation. Relations can 
be assigned with different weights and participate the 
iterative calculation as shown in Eq. (5), where r is the 
relation of link e{i,j), ojr is the weight of r that affects 
the calculation of TC in each iteration: 



temp ui^'^tt]} = temp cj,-*^"""' + temp w.^*^"'"'' 

Where r is the relation of link e{i,j), tUr is the weight of 
r, which affects the calculation of TC in each iteration. 

An important characteristic is that the original topo- 
logical centers may change when we merge two net- 
works into one by certain links and recalculate the topo- 
logical centers in the new network. For example, if we 
merge the coauthor network with the citation network 
by the authorOf semantic links, the topological centers of 
the new network may not be simply the sum of the topo- 
logical centers in the coauthor network and those in the 
citation network. Recalculation of topological centers can 
synthesize more relations, so this can more accurately 
evaluate nodes. For example, authors can be evaluated 
by more factors (e.g., number of publications, number 
of co-authors, number of citations) in the new network 
than in the old networks. If applications require to keep 
the old topological centers in the new network and avoid 
recalculation, we can adopt the following strategy: find 
the relations (e.g., authorOf) between the old topological 
centers and then compose the corresponding old topo- 
logical centers to form new topological centers. Such 
an integrated topological centers can provide semantic 
relevant information services (e.g., the authority author 
and his/her high impact papers can be obtained at the 
same time) for applications in large network. 
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5.2 Related Works 

General community discovery approaches are based on 
the connections between vertices in a network. A fast 
commimity discovery algorithm in very large network 
was proposed with approximate linear time complexity 
0{nlog'^n), where n is the number of nodes [18]. The 
general methods like GN algorithm can be used to 
discover communities in weighted networks by mapping 
them onto imweighted networks [19]. 

Research and learning resources form a network, and 
the connections are the relations among resources. Differ- 
ent from the commimities in general complex networks, 
semantic commimities in the relational network were 
discovered according to the roles of relations during 
reasoning on relations [20]. 

Many works are on the collaboration networks and 
citation networks of scientific research. Most of them 
focus on the characteristics of collaboration networks. 
For the structure of social science collaboration network, 
disciplinary cohesion from 1963 to 1999 was studied 
[21]. The structure of scientific collaboration networks 
including the shortest paths, weighted networks, and 
centrality was studied [22] [23] [24]. Coauthor rela- 
tions were used to study the collaborations between 
researchers especially the mathematician, and the dis- 
tribution of relations between papers of Mathematical 
Review against the number of authors was studied [25] 
[26]. Relations between researchers were analyzed in 
Edors collaboration graph, and the shortest path lengths 
between researchers were studied [27]. 

Evolutions of the social networks of scientific collabo- 
rations in mathematics and neuro-science were studied 
[28]. The research result shows that the social network 
of collaboration network is scale-free; and, the node 
separation decreases with the increase of connections. 

Social network in academic research can be extracted 
from the webpages and paper metadata provided by 
the online databases [29]; furthermore, relations among 
researchers are mined in academic social networks [30]. 
Social structure in scientific research was studied based 
on the citations [31]. 

Citation relations between scientific papers, and the 
citation distribution of papers was studied [32] [33] [34], 
and shows that some papers are not cited at all, most 
papers are cited once, while a little part of papers covers 
the references of most papers in a research area. 

Resources in research networks are ranked in Ob- 
ject level. Research resources were ranked by popRank 
approach considering the mutual influences between 
relevant resources [35]. Object based ranking approach 
can help search and recommend different resources such 
as papers, conferences, journals and researchers etc. 

Researchers and papers are often ranked in coauthor 
network and citation network respectively. A co-ranking 
framework of researchers and papers was proposed, in 
which researchers and papers were ranked in a hetero- 
geneous network combining the coauthor network and 
citation network by coauthor relations [36]. 



Our approach is different from the existing approaches 

in the following aspects: 

1. We distinguish the roles of nodes by topologi- 
cal centrality, and then discover the communities 
by roles of nodes. Global communities and local 
communities are discovered based on the roles of 
nodes. So our approach is based on role rather than 
only on the connections. Although the topological 
centrality degrees of nodes and edges are calcu- 
lated considering connections between nodes, the 
topological centrality degrees of neighbor nodes 
have influences on each other at the same time. The 
role based community discovery approach is fit for 
the research networks, and can discover communi- 
ties in tree-like networks that are hard to discover 
by general community discovery approaches such 
as GN algorithm. 

2. We have built the backbone networks for coauthor 
networks and citation networks, and the evolution 
characteristics of backbone networks have been 
studied. PageRank algorithm can also find the local 
core nodes, but it has no way to connect most of 
the core nodes into a backbone network, because it 
is hard to choose the connecting nodes between the 
core nodes by the PageRank values. While topolog- 
ical centrality degrees of nodes can choose the core 
nodes and connect them into a connected backbone 
network as more as possible, because the core 
nodes include the community central nodes and 
important nodes connecting different communities. 
The backbone network construction approach is 
also based on the topological centrality. The ap- 
proach can be applied not only in the research 
networks with single resource type but also those 
with multiple resource types. 



6 Conclusion 

This paper first proposes the notion of topological cen- 
trality and the calculation approach to reflect the topo- 
logical positions of nodes and edges in a network, and 
then studies its applications in discovering communities 
and building the backbone network in scientific research 
networks. Research communities can be discovered ac- 
cording to the roles of nodes distinguished by topolog- 
ical centrality degrees. We also propose an approach to 
building the backbone network by using the topological 
centrality. Experiments on real research network and 
simulation networks show the feasibility and effective 
of our approaches. 
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